Terminology in the age of multilingual corpora
نویسنده
چکیده
Terminology management has long played an important role in translation and localisation. It has been asserted, however, that the need for terminology management is declining with the rise of widely accessible aligned multilingual corpora, such as bitexts. In this view, translators will be able to identify terms and their translations by using previous translations to automatically identify the best translation for a term. This article, however, argues that while bi-text resources will assist in human-oriented terminology management, they will actually increase the need for skilled terminology work and termbases. Furthermore, because more tools will generate terminological data, the need for exchange between tools will increase. After discussing the case for terminology management and terminology exchange in the age of aligned multilingual corpora, the paper describes the role of the TermBase eXchange (TBX) standard in terminology exchange, including typical scenarios for its use, and some of the challenges faced in using it.
منابع مشابه
TTC TermSuite - A UIMA Application for Multilingual Terminology Extraction from Comparable Corpora
This paper aims at presenting TTC TermSuite: a tool suite for multilingual terminology extraction from comparable corpora. This tool suite offers a userfriendly graphical interface for designing UIMA-based tool chains whose components (i) form a functional architecture, (ii) manage 7 languages of 5 different families, (iii) support standardized file formats, (iv) extract singleand multiword ter...
متن کاملExploiting a Multilingual Web-based Encyclopedia for Bilingual Terminology Extraction
Multilingual linguistic resources are usually constructed from parallel corpora, but since these corpora are available only for selected text domains and language pairs, the potential of other resources is being explored as well. This article seeks to explore and to exploit the idea of using multilingual web-based encyclopedias such as Wikipedia as comparable corpora for bilingual terminology e...
متن کاملNormalising the IJS-ELAN Slovene-English Parallel Corpus for the Extraction of Multilingual Terminology
Various efforts have been made for the development of tools and methods dedicated to the automatic processing of multilingual terminology databases. For that purpose, multilingual parallel corpora have been used as a basis resource. However, most of the neologisms in technical and scientific domains are realised by multiword terms that are rarely identified in parallel corpora. In this paper, w...
متن کاملMapping WordNet Domains, WordNet Topics and Wikipedia Categories to Generate Multilingual Domain Specific Resources
In this paper we present the mapping between WordNet domains and WordNet topics, and the emergent Wikipedia categories. This mapping leads to a coarse alignment between WordNet and Wikipedia, useful for producing domain-specific and multilingual corpora. Multilinguality is achieved through the cross-language links between Wikipedia categories. Research in word-sense disambiguation has shown tha...
متن کاملBilingual terminology extraction: an approach based on a multilingual thesaurus applicable to comparable corpora
This paper presents several methods for exploiting multiple resources in bilingual lexicon extraction, either from parallel or comparable corpora. First, a special attention is given to the use of multilingual thesauri, and different search strategies based on such thesauri are investigated. Then, a method to optimally combine the different resources for bilingual lexicon extraction is presente...
متن کامل